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Abstract Building higher-dimensional copulas is generally recognized as a dif- 
ficult problem. Regular- vines using bivariate copulas provide a flexible class of 
high-dimensional dependency models. In large dimensions, the drawback of the 
model is the exponentially increasing complexity. Recognizing some of the con- 
ditional independences is a possibility for reducing the number of levels of the 
pair-copula decomposition, and hence to simplify its construction (see [T]). The 
idea of using conditional independences was already performed under elliptical 
copula assumptions [IT] , [2l] and in the case of DAGs in a recent work [2] . 

We provide a method which uses some of the conditional independences en- 
coded by the Markov network underlying the variables. We give a theorem which 
under some graph conditions makes possible to derive pair-copula decomposition 
of the probability density function associated to a Markov network. 

As the underlying Markov network is usually unknown, we first have to discover 
it from the sample data. Using our results published in [33] and [21] we will show 
how to derive a multidimensional copula model exploiting the information on 
conditional independences hidden in the sample data. 

Keywords Copula decomposition ■ f-cherry junction tree ■ Markov network ■ 
Cherry-wine probability distribution ■ Graphical models 



1 Introduction 

Copulas in general are known to be useful tool for modeling multivariate proba- 
bility distributions since they serve as a link between univariate marginals. Pair- 
copula construction introduced by H. Joe [18] is able to encode more types of 
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dependencies in the same time since they can be expressed as a product of differ- 
ent types of bivariate copulas. For solving the problem wich occurs when we want 
to find consistent marginal copulas involved in the expression of a junction tree 
copula density (see [21]) we found extremly useful the concept of Regular-vine 
copulas. Our research in this direction was also motivated by the arising open 
questions in the papers published in this field, as follows. 

The paper [lj calls the attention on the fact that "conditional independence 
may reduce the number of the pair-copula decompositions and hence simplify the 
construction" . In this paper the importance of choosing a good factorisation which 
takes advantage from the conditional independence relations between the random 
variables is pointed out. In the present paper we give a method for findig that 
pair copula construction which exploits the conditional independences between 
the variables of a given Markov network. We also give a method for constructing 
Regular- vines starting from a multivariate data set. 

The importance of taking into account the conditional independences between 
the variables encoded in a Bayesian Network (directed acyclic graph) was explored 
in the papers [2l] and [T7]. Two problems of this aspect are discussed. First when 
the Bayesian Network (BN) is known, some of the conditional independences taken 
from the BN are used to simplify a given expression of the D- or C- vine copula. 
Secondly the problem of reconstruction of the BN from a sample data set was 
formulated under the assumption that the joint distribution is multivariate normal. 
For discovering the independences and conditional independences between the 
variables in [T7] are used the correlations, the conditional correlations and the 
determinant of the correlation matrix. In the present paper we also exploit the 
conditional independences encoded in a Markov network which has the advantage 
that we do not need to know the ordering of the random variables. We will express 
the conditional independences in terms of information theoretical concepts which 
do not need any assumption on the type of copula. 

In the recent work [2] Bauer et al. are dealing with a more general case with the 
pair-copula constructions for non-Gaussian BN. In there paper the BN is supposed 
to be known. The formula of probability distribution associated to the given BN 
is expressed by pair-copulas. A similar idea will be used in our approach, we will 
transform the so called cherry-tree copula introduced in |21j into a vine copula 
constructed from pair copula-blocks. 

The truncated Regular- vine copula is defined in [23] and [5]. In [23] an algo- 
rithm is developed for searching the " best vine" . This algorithm uses the partial 
correlations. This paper suggested us the idea to prove a theorem which ensures 
the construction of the best truncated Regular-vine distribution, at a given level k. 
In order to find such a representation we give a greedy algorithm, which generally 
is a good heuristic, but if some assumptions are fulfilled the algorithm results the 
optimal solution. 

Because the work of the present paper is strongly related to Markov networks 
which also need some graph theoretical concepts, copulas and the special case 
of Regular-vine copulas the second part of the paper is a preliminary part that 
contains some of the concepts we will use throughout the paper. The third part of 
the paper discusses under which graphical conditions of the Markov network the 
multivariate copula can be expressed as a junction tree copula and as a cherry- 
tree copula. Then we give a pair-copula construction (formula) and a Regular- 
vine structure (graphical structure) of the cherry-tree copula. The fourth part of 
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the paper presents a method for finding the cherry tree copula starting from a 
multivariate sample data set. In the fifth part we discuss the properties of the best 
fitting probability density and copula density associated to truncated R-vine. We 
finish the paper with some conclusions. 



2 Preliminaries 

In this section we introduce some concepts used in graph theory and probability 
theory that we need throughout the paper and present how these can be linked to 
each other. For a good overview see [26J. 



2.1 Markov Network 

We first present the acyclic hypergraphs and junction trees. We then present a 
short reminder on Markov network. We finish this part with the multivariate joint 
probability distribution associated to a junction tree. 

Let V = {1, . . . , d} be a set of vertices and r a set of subsets of V called set of 
hyperedges. A hypergraph consists of a set V of vertices and a set r of hyperedges. 
We denote a hyperedge by Ci, where d is a subset of V. If two vertices are in the 
same hyperedge they are connected, which means, the hyperedge of a hyperhraph 
is a complete graph on the set of vertices contained in it. 

The acyclic hypergraph is a special type of hypergraph which fulfills the following 
requirements: 

— Neither of the edges of r is a subset of another edge. 

— There exists a numbering of edges for which the running intersection property is 
fulfilled: V j > 2 3 i < j : Ci D Cj n (Ci U . . . U Cj-x). (Other formulation is 
that for all hyperedges Ci and Cj with i < j— 1, CiClCj C C s for all s, i < s < j.) 

Let Sj = Cj n (Ci U . . . U Cj-x), for j > 1 and Si = cf>. Let Rj = Cj\Sj. We say 
that Sjseparates Rj from (Cj U . . . U Cj_x) \Sj, and call Sj separator set or shortly 
separator. 

Now we link these concepts to the terminology of junction trees. 

The junction tree is a special tree stucture which is equivalent to the connected 
acyclic hypergraphs [26] . The nodes of the tree correspond to the hyperedges of 
the connected acyclic hypergraph and are called clusters, the edges of the tree 
correspond to the separator sets and called separators. The set of all clusters is 
denoted by C, the set of all separators is denoted by S. The junction tree with the 
largest cluster containing k variables is called k-width junction tree. A vertex which 
is contained in only one cluster is called simplicial. The cluster which contains a 
simplicial is called leaf cluster. 

An important relation between graphs and hypergraphs is given in [26] : A 
hypergraph is acyclic if and only if it can be considered to be the set of cliques of 
a triangulated graph (a graph is triangulated if every cycle of legth greater than 
4 has a chord). 

In the Figure [T] one can see a) a triangulated graph, b) the corresponding 
acyclic hypergraph and c) the corresponding junction tree. 
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Fig. 1 a) Triangulated graph, b) The corresponding acyclic hypcrgraph, c) The corresponding 
junction tree which is a t-cherry junction tree 



We consider the random vector X = (Xl, . . . , X d ) T , with the set of indi- 
cies V = {l,...,d}. Roughly speaking a Markov network encodes the condi- 
tional independences between the random variables. The graph structure asso- 
ciated to a Markov network consists in the set of nodes V, and the set of edges 

E = {(i,j)\i,j ev}. 

We say that the probability distribution associated to a Markov network has 
the global Markov (GM) property |T6]if in the graph \/A, B,C C V and C separates 
A and B in terms of graph then and X b are conditionally independent given 
Xc which means in terms of probabilities that 

pnf s P(X AuC )P(X AuC ) 
P(Xaubuc) ■ 

The concept of junction tree probability distribution is related to the junction 
tree graph and to the global Markov property of the graph. A junction tree prob- 
ability distribution is defined as a product and division of marginal probability 
distributions as follows: 

n p&c) 

p ex) = Cec 

n [p^p- 1 ' 

ses 

where C is the set of clusters of the junction tree, S is the set of separators, vg is 
the number of those clusters which contain the separator S. We emphasize here 
that the equalities written as -P(X) = f(P(X.^),K g C), where / : J?x - ► R hold 
for any possible realization of X. 

Example 1 The probability distribution corresponding to Figure [1] is: 

^(X {1 , 2 ,3}) J P(X {2i 3,4})P(X {3 ,4, 5 }) 



P(X) 



P(X {2 , 3} )P(X {3 ,4}) 

P{X 1 ,X 2 ,X 3 )P{X 2 ,X 3 ,X 4 )P(X 3 ,X 4l ,X 5 ) 
P(X 2 ,X 3 )P(X 3 ,X 4 ) 
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In our paper [33] we introduced a special kind of fc-width junction tree, called 
k-th order t-cherry junction tree in order to approximate a joint probability distri- 
bution. The fc-th order t-cherry junction tree probability distribution is assigned 
to the fc-th order t-cherry tree, was introduced in [6], [7J. 

Definition 1 The recursive construction of the k-th order t-cherry tree: 

— (i) The complete graph of (k — 1) nodes from V represent the smallest fc-th 
order t-cherry tree; 

— (ii) By connecting a new vertex i k £ V, with all {ii, • • ■ 1} vertices of 
a (fc — 1)- dimensional complete subgraph of the existing fc-th order t-cherry 
tree, we obtain a new fc-th order t-cherry tree, {{«&} {ii, . . . is called 
fc-th order hypercherry. 

— (iii) A fc-th order t-cherry tree can be obtained from (i) by successive application 
of(ii). 

The fc-th order t-cherry tree is a special triangulated (chordal or rigid circuit) 
graph therefore a junction tree structure is associated to it (see [26]). 

Definition 2 ([33 ) The k-th order t-cherry junction tree is defined in the following 
way: 

— By using Definition [T] we construct a fc-th order t-cherry tree over V. 

— To each hypercherry {ii, . . . , %}} is assigned a cluster • • • , ifc} 
which represents a node of the junction tree and a separator {ii, ■ ■ ■ ,ik—i} 
which is an edge of the junction tree. 

We denote by CLjj, and S c ^, the set of clusters and separators of the t-cherry 
junction tree. 

Definition 3 ([33]) The probability distribution given by ([1) and ([2]) are called 

t-cherry junction tree probability distribution 

n pp-k) 

KeC rh 

P t rh CX) = — r (1) 



se5 ch 



in the discrete case and 



n fx^k) 

P t _ ch (X) = T (2) 

n (fs^r- 1 

seSch 

in the continuous case, where vg denotes the number of clusters which contain the 
separator S. 

Remark 1 The marginal probability distributions and the density functions in- 
volved in the above formula are marginal probability distributions of P (X). 

Example [T] shows a 3-rd order t-cherry junction tree probability distribution. 

In the following instead of probability distribution associated to a junction tree 
we will use shortly junction tree pd and similarly instead of fc-th order t-cherry 
tree junction tree distribution we will use shortly fc-th order t-cherry pd. 
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2.2 Copula, Regular-vine copula, junction tree copula and cherry-tree copula 

Definition 4 A function C : [0; l] d — > [0; 1] is called a d-dimensional copula if it 
satisfies the following conditions: 

1. C (u%, . . . , u d ) is increasing in each component Uj, 

2. C(ui, .. . ,Uj_i,0,itj + i,. ..,u d )=0 for all u k € [0; 1], k ^ i, i = l,...,n, 

3. C (1, . . . , 1, Ui, 1, . . . , 1) = Ui for all u» 6 [0; 1] , i = 1, . . . , d, 

4. C is d-increasing, i.e for all • • • ,wi,<i) an< i \ u 2,i, ■ ■ ■ ,U2.d) m [0; 1] with 
"i % < «2,i for all i, we have 

2 2 E i • 

E •■■ E 1 ls ( '(»'.' 

*1 = 1 id=l 

Due to Sklar's theorem if Xi , . . . , X d are continuous random variables de- 
fined on a common probability space, with the univariate marginal cdf's Fx t (xj) 
and the joint cdf Fx l ... Xa { x l-> ■ ■ ■ i x d) then there exists a unique copula func- 
tion Cx 1 ... Xd ( w i> • • • i u-d) : [0; 1] — > [0; 1] such that by the substitution ui = 
Fi {xi) , i = 1,.. -,d we get 

Fx 1 ,...,x d --j^d) = Cx!,...,^ (-Fi (zi) ,---,F d (x d )) 

for all . . .,x d ) T e 

In the following we will use the vectorial notation Fx_ v (xy) = Cx v (uy), where 

uy = (F Xil (x ix ),..., _F Xi(j (as,j) . 
It is known that 



fx^,...X Zd (x ll ,...,x id ) = c Xil ,...x id (Fx H {xij ,---,F Xld {x ld )j ■ Y\ fx ik (a 

k=l 

In vectorial notation this can be written as 

fx v N = c Xv (uy) • J} (sr ik ) 

which can be written as 

/X v ( x v) 



n fx lk (x lk ) 

ik€V 



The Regular-vine structures were introduced by T. Bedford and R. Cooke in 
[3], 0] and described in more detail in [25] . 

If it does not cause confusion, instead of /x D and cx D we will write fjj and 
cd. We also introduce the following notations: 

Fi j\D ~ the conditional probability distribution function of X; and Xj given 
fi.j\D ~ the conditional probability density function of Xi and Xj given Xj), 
C{ j\D — the conditional copula density corresponding to f i 
where D C V;i,j € V\D. 

According to the definition in [25] : 
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Definition 5 A Regular-vine (R-vine) on d variables consists first of a sequence of 
trees T\, T2, ■ ■ ■ , 2d-i with nodes jVj and edges Ei for i = 1, . . . ,d— 1, which satisfy 
the following conditions. 

— T\ has nodes N± = {1, . . . , d} and edges E\ . 

— For i = 2, . . . , d — 1 the tree Tj has nodes iVj = -E;-i • 

— Two edges in tree Ti are joined in tree T i+1 only if they share a common node 
in tree Tj. 

The last condition usually is referred to as proximity condition . 

It is shown in [3] and [25] that the edges in an R-vine tree can be uniquely 
identified by two nodes, the conditioned nodes, and a set of conditioning nodes, i.e., 
edges are denoted by e = j (e) , k (e) \D (E) where D (E) is the conditioning set. For 
a good overview see [12]. The next theorem which can be regarded as a central 
theorem of R-vines see [3] links the probability density function to the copulas 
assigned to the R- vine structure. In [3] it is shown that there exists a unique 
probability density assigned to the R-vine, in [3] it is shown that this probability 
distribution can be expressed as ([3]) . 

Theorem 1 The joint density o/X = (X\, . . . ,X^) is uniquely determined and given 
by: 

f (xi,. ..,x d ) = 
d-1 

■ I! I! c j(e),k(e)\D(e) { F j(e)\D(e) (%j(e) l x D(e)) , -Pfe(e) \D(e) (»fe(e) l x D(e))) • 
i=l e£Ei 

The arguments of the pair copulas are conditional distribution functions and 
can be evaluated using the following expression given by H. Joe [18 

F j(e)\D(e) {x 3 (e)\X-D(e)) 

_ ^j(e),i\D(e)\i {Fj(e)\D(e)\i ( X j(e)\*-D(e)\i) > Fi\D{e)\i ( x i x J D(e)\i) ) 
d F i\D(e)\i ( K i| x _D(e)\i) 

where i G D (e) ,j (e) ^ D (e). 

We give now an other definition which is related to the fc-th order t-cherry 
junction tree structure, see Definition [2] which is in fact a fc-width order uniform 
hypertree. 

Definition 6 The Regular-vine structure is given by a sequence of f-cherry junc- 
tion trees T±,T2, ■ • ■ , T d _ 1 as follows 

— Tiis a regular tree on V = {l,...,d}, the set of edges E\ = {e\ = (ij,mj) 
, i = 1, . . . , d — 1, £j,mj 6 V}; The copula densities c ; . m . (T 7 ;. [x u ) , F mi ( x m )) 
are assigned to the edges of this tree. 

— T2 is the second order t-cherry junction tree on V = {1, . . . , d}, with the set of 
clusters Ei = {ef , i = 1, . . . , d — l\ef = e\ } , |e|| =2; the copula densities 




n fk(xk) 



k=l 
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are assigned to each pair clusters e 2 and e 2 , which are linked in the junction 
tree T2, where: 



J i3 
2?- 

I J 



e 2 ne 2 
p 2 - S 2 



T k is one of the possible fe-th order t-cherry junction tree on V = {1, ... , d}, 

with the set of clusters = jef , i = 1, . . . , d — k + l| , where each e^, = fc 

is obtained from the union of two linked clusters in the (k — l)-th order t-cherry 
junction tree T^-i ; The copula densities 



H' b &i s & V a "j\ s ^ l a H |Xs '?J ' Fb ^\ s ij |; ' v ' |x - 



are assigned to each pair of clusters e\ and e^, which are linked in the 
junction tree, where: 



k _ ok 



S 



Theorem 2 T/ie Regular-vine probability distribution associated to the R-vine struc- 
ture given in Definition^ can be expressed as: 



f(xi,...,x d ) 



n h (xo 



I! c e i (afc),*}, (a; M )) 



' II 11 c a k .,b k .\S k . \ F a k .\S k . ( X a* l X S*U > F b k .\S k . ( X b k . l x S fc . ) ) 



For the following remark see [T], p. 186. 

Remark 2 Xi and Xj are conditional independent given the set of variables X^, A C 
K\ {i, j} if and only if 



c ij\A \ F i\A ' 



(xi\*-a) ,Fj\A ( x j\ k a)) = 1- 
The following theorem is an important consequence of Theorem [T] 



Theorem 3 If in an R-vine the conditional copula densities corresponding to the 
trees Tj.,Tj. +1 , . . . ,T d _ 1 are a ^ equal to 1 then there exists a joint probability dis- 
tribution which can be expressed only with the conditional copula densities assigned to 
T±, . . . ,Ti c _ 1 : 



f {xi, ...,x d ) 



II fi ( x i 



V (F u (x H ),F k (x H )) 

.i=l 



fc-1 

• n n c a%,bus^ 

i=2 eGEi 3 3 3 



F bh\S k 



X b k 



The following definition of truncated vine at level k is given in [5] . 
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Definition 7 A pairwisely truncated R-vine at level k (or truncated R-vine at level k) 
is a special R-vine copula with the property that all pair-copulas with conditioning 
set equal to, or larger than k, are set to bivariate independence copulas. 

There arise the following questions. What special properties have the proba- 
bility distribution, if we set to 1 the conditional copula densities associated to the 
trees T^, ... ,T d _ 1 of its Regular vine? Which are the properties of the Markov 
network associated? We will answer these questions in Section [3] and Section [5l 

The problem of finding the optimal truncation of the vine structure is formu- 
lated in [23] as follows: " If we assume that we can assign the independent copula 
to nodes of the vine with small absolute values of partial correlations, then this al- 
gorithm can be used to find an optimal truncation of a vine structure." Kurovicka 
defined as " best vine" the one whose nodes of the top trees (tree with most con- 
ditioning) correspond to the smallest absolute partial correlations. However small 
partial correlation result conditional independence only under restrictive assump- 
tion, so our approach deals with a more general case in Section [3l 

In [2 1J we proved a theorem which connects the general junction tree proba- 
bility distributions with the junction tree copulas. This theorem can be adapted 
to the t-cherry junction trees in the following way. 

Theorem 4 The copula density function associated to a junction tree probability dis- 
tribution defined in Definition^ 



/x(x) 



n /xkN 

Kec ch 

n t/x s (x S )p- 

S<£S ah 



is given by 



n c x x (ux ) 

Kec ch 

n [cx s (u s )r 

S£<S c h 



c x m = „r , s . vs _ x . (4) 



Definition 8 The copula density given by Formula ^ is called t-cherry junction 
tree copula density or simply t-cherry copula. 



3 The characteristics of the Markov network associated to a continuous 
joint pd which can be expressed as a truncated R-vine 

In this part we refer to Regular- vines as they are defined in Definition [6) First we 
illustrate the main ideas on an example. 

The edge set of the first tree and the sequence of the t-cherry trees (in Figure 
[2]) together with the copula densities determined by Definition[6]are the following: 
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1 2 



J] — [U — H — |T| Ti 



2 3 



2 6 



34 



45 



1 2 3 



fin 



234 



23 6 



345 T, 



1234 



2345 



2346 



1 2345 



234 5 



23456 



Fig. 2 Example for an R-vine structure on 6 variables using Definition [51 



Ei ~- 


= {(1,2), (2,3), (2,6), (3, 4), (4,5)} 








C-1,2, 


C2.3, C2,6> C3,4, 24,5; 








E 2 = 


= {cf = (l,2), e | = (2,3), e § = (2,6 
= e? ne| = {2}, 


),e! = 


= (3,4) )C § = (4,5)} 




c.2 
^1,2 










a?.2 = e?-5 1 2 2 = {l},6f. 2 = e| 


_ ^1,2 


= { 3 }> C < 2 ,6?. 2 |SJ >2 


= c l,3 2 


o2 
°2,3 


= e|ne| = {2} 2 , 








a 2,3 = e 2 ~ = {3} i ^2,3 = e 2 


c2 
°2,3 




= c 3,6 2 


$2,4 


= e| n el = {3} , 










„2 „2 o2 TOT !,2 „2 
02,4 = e 2 - ^2,4 = \ 2 T 1 °2,4 = e 4 


~ ^2,4 


= { 4 }> C a!, 4 ,b! j4 |S! >4 


= c 2,4 3 


$4,5 


= el n e| = {4} , 










„2 „2 o2 foi 7,2 „2 

04,5 — e 4 — 5 — |.3} , 04 5 — e 5 • 


~ ^4,5 


= { 5 }. c aI, 5 ,6j )6 |S2 i5 

),el = (3,4,5)} 


= c 3,5 4i 


E 3 = 


= {e? = (1,2,3), e 3 2 = (2,3,4), e| = 


(2,3,6 




c3 
^1,2 


= e? nei = {2,3}, 








ai,2 = e?-S 1 3 2 = {l},6f, 2 = e| 


c3 
^1,2 


= { 4 }> C < 2 ,6?, 2 |S?. 2 


= c l,4 2,3 


^2,3 


= e|nei = {2,3}, 










«2,3 = e 2 - 5| i3 = {4} , b\ 3 = el 


c3 
^2,3 


{^} ? ^^2, 3-^2,3 ^2,3 


= c 4,6 2,3 


e3 
'-'2,4 


= e |nei = {3,4}, 








a| )4 = el - S| )4 = {2} , 6l. 4 = e| 


c.3 
°2,4 


= { 5 }> C a|, 4 ,fc| >4 |Sf >4 


= c 2,5 3,4 



T 4 : £ 4 = {ef = (1,2, 3,4), ef = (2,3, 4,5), e| = (2,3,4,6)} 
Sf, 2 = efnet = {2,3,4}, 

ai,2 = e i ~ 5*1,2 = {!} > b !,2 = e 2 ~ Si,2 = {5} , c a 4 2j6 4 2 | S 4 2 = c 1)5 
5 2 3 3 = ef ne| = {2,3,4}, 

ffl2,3 = e 2 - ^3 = {5} , & 2j3 = ef - s£ 3 = {6} , c a 4 ^ 3 | S | 3 = c 5 , 6 
T s : E 5 = {ef = (1, 2, 3, 4, 5) , e\ = (2,3,4,5,6)} 
Si 2 = e?ne! = {2,3,4,5}, 



<2 = e? " ST 



e 2 - Sl,2 - {6},C a 5 2)6 5 2 | S 5^ 



-1,6 



2,3,4 



2,3,4 



2,3,4,5- 
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The joint probability density function of X = (Xi, . . . , Xq) can be expressed by 
Theorem [2] as follows: 

/ (asi,as2, 2:3,3:4, X5,X6) = 

= (f[f ( x i)j c h2 {Fi (an) , F 2 (x 2 )) ■ c 2 ,3 (F 2 (x 2 ) , F 3 (x 3 )) ■ c 2 , 6 (F 2 (x 2 ) , F 6 (x 6 )) 

•C3,4 (F 3 (x 3 ),Fi (x 4 )) 
•C4,5 {F4, (xi),F 5 (x 5 )) 
• c i,3|2 { F i\ 2 {xi\x 2 ),F 3 \ 2 (x 3 \x 2 )) 

• C 3,6|2 { F 3\ 2 (X 3 \X 2 ) , Fg| 2 (x 6 \x 2 )) 

' c 2,4|3 (^213 (^M , ^ 4 |3 ( X 4 X 3)) 
• c 3,5|4 {F 3 \i (x 3 \xi) , F 5 | 4 (a; 5 |a;4)) 

• c l,4|2,3 ( F l|2,3 ( a; l| a; 2,a;3) ,^4|2,3 (^4 1^2 , ^3 )) 

• c 4,6|2,3 ( F 4|2,3 (^l^, X 3 ) , F 6 \ 2<3 (x 6 \x 2 ,X 3 )) 

• c 2,5|3,4 (^213,4 (^2|a;3,a;4) ,-^513,4 (x 5 \x 3 ,X 4 )) 

• c l,5|2,3,4 (-^112,3,4 (xi\x 2 ,X 3 ,Xi) ,-F 5 | 2 ,3,4 (x 5 \x 2 , X 3 , X4)) 

• c 5,6|2,3,4 ("^512,3,4 {xi\x 2 , X 3 , Xi) ,-F 6 |2,3,4 (xe\x 2 , X 3 , Xi)) 

' c l,6|2, 3,4,5 (-^112,3,4,5 (xi\x 2 , X 3 , Xi, X<$) ,-F 6 |2,3,4,5 { x 6 \x 2 , X 3 , Xi, X 5 )) 

In this part we regard the graph of the Markov network to be known. So let 
us suppose that the Markov network, which encodes the conditional probabilities 
between the random variables X\ , . . . , Xq is given in Figure [3] 
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Fig. 3 3-rd order t-cherry junction tree 



If the Markov network has the structure in Figure [3] then it is easy to iden- 
tify the following conditional independences which are consequences of the Global 
Markov property: 

Xi _L Xi\X 2 , X 3 ; Xi _L Xq\X 2 , X 3 ; X 2 ± Xs\X 3 ,Xi; 

X\ _L Xf\X 2 ,X 3 ,Xi\ X5 .L Xq\X 2 , X 3 , X^ 
Xi _L Xq \X 2 , X 3 , Xi , X§ . 

Based on the existence of these conditional independences the conditional cop- 
ula densities associated to the trees T 3 ,Ti,T$ 

C l,4|2,3 (-Fl|23 (XX\X2,X 3 ) ,F 4 | 2i3 (x 4 \x 2 ,X 3 )) , 
c 4,6|2,3 ( F l|23 i x l\ x 2,X 3 ) ,-F 4 |2,3 (x 4 \x 2 ,X 3 )) , 

c 2,5|3,4 (-^2,513,4 (x 2 \x 3 ,Xi) , F 5 | 3j4 (x 5 \x 3 ,Xi)) , 

c l,5|2,3,4 (Fl i5 | 2 ,3,4 {xi\x 2 ,X 3 ,Xi) ,F^ 2 3 4 (x 5 \x 2 ,X 3 ,Xi)) , 

c 5,6|2,3,4 (-^512,3,4 {xi\x 2 , X 3 , X 4 ) , F^ 2 3A (x 6 \x 2 , X 3 , X 4 )) , 

c l,6|2, 3,4,5 (Fi|2,3,4,5 ( x 1 \ x 2, x 3, Xi, X5) , -F 6 | 23)4S (x 6 \x 2 , X 3 , Xi , X 5 )) . 
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are all equal to 1. We can observe here that a Markov network of the form of a 
3-rd order t-cherry tree (see Definition [1} can be expressed as an R-vine truncated 
at level 3. 

This example suggests, that there are t-cherry tree probability distributions 
which can be represented as a truncated vines. 

In the following we suppose the case when the set of separators of the fe-th 
order t-cherry junction tree form a (k — l)-th order t-cherry junction tree. In this 
case we give an algorithm, which constructs a Regular-vine structure associated 
to a fc-th order t-cherry tree probability distribution (see Definition [3]). 

Algorithm 1 Algorithm for obtaining from a t-cherry junction tree a truncated 
Regular-vine construction. 

Input: A t-cherry tree structure, ie a set of clusters of size k, and the junction 
tree structure given by the separators. 

Output: A Regular-vine truncated at level k. 

We obtain recursively an (m — 1) width t-cherry junction tree from a m- width 
t-cherry junction tree, for m = k, . . . , 1 as follows: 

— 1. Step. The separators of the m- width t-cherry tree will be the clusters in the 
(m — l)-width t-cherry tree, which will be linked if between them is one cluster 
in the m-width t-cherry tree, and they are different. 

— 2. Step. The leaf clusters, those clusters which contain a simplicial node, are 
transformed into (m — l)-width clusters, by deleting a node which is not sim- 
plicial. The cluster obtained in this way will be connected to one of the clusters 
obtained in Step 1, which was the separator linked to it in the m-width t-cherry 
tree junction tree. 



Definition 9 The Regular-vine structure obtained from a t-cherry tree structure 
using Algorithm [I] is called cherry-wine structure. 



Definition 10 The joint probability density assigned to a cherry-wine structure 
is called cherry-wine density, the corresponding copula density is called cherry-wine 
copula density. 



Theorem 5 A t-cherry copula can be expressed as a cherry-wine copula in the follow- 
ing way: 



n ck (%) 

Kec ch 

n [cs(u S )p- 



II c e i (F h (x H ),F h (x H )) 



I I I I c a k ,b k \S k - ( F a k \S k - ( X a k \ :x -S k - ) > F b k \S k - [ x b k \ 

■ £- rp % 3 x 3 % 3 \ Z J % 3 \ l 3 z 3 / Z J z 3 \ ' l 3 



i=2 e£Ei 



Example in Figure [4] shows how to apply Algorithm [T] to a given 3-rd order 
t-cherry junction tree to obtain a cherry-wine structure. 

The cherry wine probability distribution assigned to the 3-rd order cherry-wine 
structure in Figure [3] is: 
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Step 2. 



1 2 3 4 5 



Fig. 4 Application of Algorithm 1 to a 3-rd order t-cherry junction tree in order to obtain a 
3-rd order cherry-wine structure 



f (xi,X 2 ,X 3 ,Xi,X 5 ,X 6 ) 



I! / fa) J • ci, a (Fi (xt) , F 2 (x 2 )) ■ c 2 , 3 (F 2 (x 2 ),F 3 (x 3 )) 

■C2,6 {F 2 (x 2 ) , F 6 (x 6 )) • c 3:4 (F 3 (x 3 ) , F4 (xa)) ■ c 4i5 (F 4 (354) , F 5 (x 5 )) 
■ c i,3|2 (-Pi 1 2 {xi\x 2 ) , F 3 | 2 (as 3 |aJ2)) • c 3j6 | 2 (F 3]2 (x 3 \x 2 ) , F 6{2 (x 6 \x 2 )) 
' c 2,4|3 ( F 2\3 {x2\x 3 ) , F 4 | 3 (a^Ja*)) ■ c 3)5 | 4 (f 3 | 4 (a; 3 |a; 4 ) , F 5 | 4 (a; 5 ]a; 4 )) 

Remark 3 Applying Algorithm [T] can result more cherry-wine structures since in 
Step 2 we can proceed in different directions. 

Starting from the 3-rd order f-cherry junction tree given in Figure [3] we can 
obtain 2 #leaf clusters = 8 2-nd order t-cherry trees. In the last step there is only 
one possibility to construct the first tree. We emphasize here that, if the Markov 
network has the 3-rd order f-cherry tree structure in Figure[3l than from the 23,040 
possible R- vines (see [13], [30]) remain only 8. 
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The question, which arises here is whether the 3-rd order t-cherry junction tree 
is not a very special structure. 

We proved in [33] the following theorem by a constructive method: 

Theorem 6 Any k-width junction tree probability distribution can be expressed as a 
k-th order t-cherry tree probability distribution. 

Remark 4 There exists more expressions for the t-cherry probability distribution, 
but much smaller number than Regular- vines, and has the advantage of exploiting 
the conditional independences. 

4 A model selection for special R-vines called cherry-wines 

Now we suppose to have a sample data set. Starting from this dataset we want 
to find a good fitting probability distribution. The main idea is fitting the cop- 
ula function and the marginal probability distribution separately. Using pair-vine 
constructions we will express the joint density function only by marginal distribu- 
tions and bivariate (pair)-copulas. First we will search for a good fitting regular 
vine structure. As it is shown in [30] the number of possible regular vines grows 
exponentially with the number of variables. So the basic idea is searching through 
truncated R-vine copulas at a given level k. 

Full inference for pair-copula decomposition should in principle consider three 
elements [1]: 

— The selection of a specific factorization; 

— The choice of pair-copula types; 

— The estimation of parameters of the chosen pair-copulas. 

This paper is concerned with finding of factorization which exploits some of 
the conditional independences between the random variables. 

There are many papers dealing with selecting specific Regular-vines as C-vine 
or D-vine see for example [1]. 

The main idea of our approach is finding a t-cherry copula and then trans- 
forming it by Algorithm 1 into a cherry-wine copula, which depends just on pair- 
copulas. So we will start at a given level k, search for the best fitting t-cherry 
copula to the sample data and find then the factorization which results the chosen 
fc-th order t-cherry tree. 

4.1 The Sample derivated copula 

The empirical probability distribution of the sample data is a discrete multivariate 
probability distribution. If this data is drawn i.i.d from a continuous joint prob- 
ability distribution all realizations are different vectors. So the joint probability 
distribution is uniform. The range of each random variable is equal to the sample 
size N. 

As it is shown in 21. we first make a partition of the range of each random 
variable involved. The intervals obtained contain the same number of data. We 
introduced a special type of copula called sample derivated copula. 
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We denote the set of the values of Xi in the sample by A,-. This set contains TV 
values, for each random variable. The theoretical range of the continuous random 
variable X t will be denoted by Aj. For every i we denote by Aj" = minAj G R 
and by Xf 1 = max Aj € R. We suppose for simplicity that minAj ^ minA and 
max A 7^ max A For each random variable Xi we define a partition of Aj by 
Pi = |xq* = Xf 1 ,x*l l , . . . ,x 1 ^ l ._ 1 ,Xm i = A^ f j with the following properties: 

"i-i'^i*] 'J = contains 
N 

a given = — £ TV number of values from the set Aj. 

771j 

- Each xP* 6 Aj, j = 1, . . . , m, - 1. 

The partition with the above properties will be called uniform partition. We 
denote by V the set of partitions {Pi, . . . , V n }- 

Let be Xi the categorical random variable associated to the random variable 

Xf. 

P(Xi 6 (a^ijja;^]) = — ,j = l,...,m,. 

We assign to each x l € ( x v ^_, ; a;^* the number u\ = —,j = 0, . . . , m;. Obviously 
V j J J J mj 

«o = an d uirii = 1- Let A = {uJIj = 0, ...,m»}. So we can define the following 

discrete uniform random variables: 

u % u\ .. 

Ui = | 1 | | | ,i = 1, . . . ,d. 

mi 

Now we transform the sample using the above assignment. We denote the trans- 
formed sample by T ■ 

d _ 

Definition 11 The function c : Y[ Ai ~ * R defined by 

z=l 

I u kl , . . . , u fctf 1 i-> clw fel ,...,M fc(i 1 = — ,ki = 0, ...,mj 

will be called sample derivated copula density. 

In Remark 6 of the paper |21j we proved also, that partitioning in this way 
the information content of the joint probability distribution depends just on the 
sample derivated copula. 

The sample derivated copula can be treated as a discrete multivariate proba- 
bility distribution. One of its advantages is that the range of the variables involved 
are significantly decreased. 

Now using the greedy Szantai-Kovacs algorithm introduced in [35] we find 
the fc-th order t-cherry copula. The goodness of fit to the data is quantified by 
Kullback-Leibler divergence. We emphasize here that finding the best fitting t- 
cherry copula is an NP-hard problem for k > 2, but there are cases, when the 
greedy algorithm finds the optimal solution, see [35] . 
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4.2 The Szantai-Kovacs greedy algorithm 

We present here the algorithm introduced in [35] . 

The following theorem regarded to discrete probability distributions given in 

ED- 

In [33] the authors give the following theorem. 

Theorem 7 The Kullback-Leibler divergence between the true P(X) and the approx- 
imation given by the k-width junction tree probability distribution P(Xj), determined 
by the set of clusters C and the set of separators S is : 

KL (P (X) , Pj (X)) = -H (X) - ( £ I (Xc) - £ (i/ s - 1) 7 (X S ) 

\cec ses 

+ EH (X t ) , 

i=l 

where /(Xp) = £ H (Xi) —H (Xc) represents the information content of the random 
ieC 

vector Xc and similarly J(X<j) = £ H (Xi) — H (Xj) represents the information 

ieS 

content of the random vector X 5 . 

d 

In Formula ([5]) — H (X) + £ H (Xi) = ^(X) is independent from the stree- 
ts 

ture of the junction tree. It is easy to see that minimizing the Kullback-Leibler 
divergence means maximizing Yl I (Xc) — ("s ~ 1) I (X5). We call this sum 

as weight of the junction tree pd. As larger this weight is, as better fits the approx- 
imation associated to the junction tree pd to the true probability distribution. It 
is well known that KL = if P(X) = Pj (X). 

Definition 12 We define the following concepts: 

— the search space: 

E = {Xi k (i 1 ,...,i k _ 1 ) = {{ x i k } > \ x ii>-- -> x i k -i}} \ X in ■ • -j-Xifc-n-Xt* G X}, 

— the independence set: 

J- = <j> U {t — cherry junction tree structure}, 

— the weight function: 

Algorithm 2 Szantai-Kovacs's greedy algorithm. 

Input: Elements of E and their weights which can be calculated based on the 
k-th order marginal probability distributions. 

Output: set A which contains the clusters of the k-th order f-cherry juntion tree 
pd and the wheight of the fc-th order t-cherry junction tree pd. 

The algorithm: 

A:=<j> 

Sort E into monotonically decreasing order by wheight w; 
Choose x = argmaXj.gg (w (x)); 
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let A := A U {x} ; E:=E\{x}; w:= I (x); 
Do for each x G E taken in monotonically decreasing order 

if A U {x} G T then let A := A U {x} ; £':=£\{x}; w := u> + to (x) ; 
if the union of subsets of A is X, then Stop; 
else take the next element of E. 



4.3 Building the cherry-wine associated to the t-cherry tree. 

We calculate the fc-th order marginal pd from the sample derivated copula. Using 
their information content we can define the weights of the elements of the search 
space E. 

Applying Szantai-Kovacs'a algorithm we obtain a good fitting t-cherry tree 
copula. 

We assign to this the fc-th order t-cherry tree the T k tree of a regular vine. 
Applying now Algorithm [1] we can find the corresponding cherry-wine structure, 
and using this the expression of the cherry-wine copula density expressed by pair- 
copulas. 

Now comes the next step the choice of pair-copula types and the estimation of 
parameters. For choosing pair copulas we have a large amount of copula-families, 
with different properties, tail-dependencies see in [18], [II] and [31] . 



5 Properties of the best fitting cherry-wine probability density, and 
cherry-wine copula density 

In this section we discuss the properties of the best fitting cherry-wine probabil- 
ity density and corresponding copula density, which are associated to an R-vine 
truncated at level k from a theoretical point of view. 
We will use the following notations: 

— fy (xy) denotes the joint probability density of Xy, fx (x^-) is the marginal 
density of fy (xy), where K c V. 

— cy (uy) denotes the joint copula density associated to the joint probability 
density fy (xy), cr- (ujj) is its marginal density which is the copula density 
corresponding to fx ( x _ff) 

— fy cs denotes the joint fc-th order cherry-wine density, associated to a fc-th order 
t-cherry junction tree with cluster C and separator set S , given by: 



n fK^ K ) 



fv cs ( x v) 



(G) 



n (/s(xs)) 



vs-l ' 



where vg is the number of clusters which contain S. 



Theorem 8 The Kullback-Leibler divergence between fy (xy) and the approximating 
probability density assigned to the cherry-wine fy cs , is given by the formula: 
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KL (j Vcs (xy) , f V (xy)) = I (Xy) 



E *( x *)- E (^-i)/(x s ) 



(7) 



Proof 



KL (f Vcs (xy) , /y (xy)) = / /y (x) log. 



/V(x) 



2 — dx 

R d fVcs ( x ) 



: / f v (x) log 2 f v (x) dx - / /y (x) log 2 /y cs (x) dx 

R d R d 

n /^( x ^) 

: -H(X)-//y(x)l0g 2 -^ 

n (/s (x s )) 

:-H(X)-J /y(x) 



-dx 



io g2 n fx (xk) - io g2 n (/s (x s )) 



vs — l 



dx 



= -ff(X)- / /y(x)log 2 I! /x(xx)dx+ / /y(x)log 2 TJ (/ s (xs))^ 1 dx. 
^ ifec ses 

Since (J AT = V and each variable belongs once more to the clusters than to the 
KeC 

separators, by adding and substracting 

/ fv (x) log 2 Yl II fi ( x i) dx 
y KeCieK 

we obtain 

n few 

KL (f Vcs (x) , f v (x)) = -H (X) - / f v (x) log 2 ff C n /t(a:t ) dx 



n ifs^sT 3 - 1 

+ //y(x)log St - S 



KeCiGK 



n 



n a (x^ 



7 ^3 T dx - / f v (x) log 2 /» ( x i) dx 
i? d *=1 



-if (X) - J f v (x) £ log 2 {? {XK \ d>: 



II •' 

- I fV (x) E lo S2 



Kec b ' II /i 

[/ S (x s )r s ^ 







n /< 




AGS 





Since (xx) , /s(x s ), fi(xi) are consistent marginals of /y (x) we have the 
following relations: 

/if (xjf) 



/ /y (x) £ log 2 



dx 



E Jfe(x)io g2 ^ (xjf) dx fc = E /(Xk) 

KGCRk 11 Ji ( x i) K£C 



(8) 
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J/v(x)Elog 2 ^f^ dx 
Rd ses 11 K x %) 

ies (9) 

= E (-5 - 1) / fs (x) io g2 j^fefW s = E - 1) / (x s ) 

ses _Rfc-i 11 J* l x v sec 

d 

- I iV( x ) E lo S2 fi(%i) 
Rd 1=1 (10) 

= E - / fi ( x ») lo S2 /•> (»<) dx t =Y, H ( X i) 
i—1 — oo i— 1 

where /(X^-), /(X5) are the information contents (see [10]) of the and X5 
corresponding to the index set K £ C and 5 € S. 

Taking into account relations (|Sj) , © and (fH)|) we obtain: 



^ (/vc S (*) , JV (x)) =J2 H ( X *)~ H ( x )~ 



E E te-i)j(xs) 

-^ec c h Ses c h 



As we know that 

E^(^)-^(x) = /(x) 

we obtained formula and this proves the theorem. 

It is easy to see that the difference / (X) do not depend on the structure of the 
junction tree . A consequence of Theorem [8] is the following remark. 

Remark 5 The probability density fy cs of the form which is the best fitting 
cherry-wine to the real probability density fy over all possible truncated R-vines 
at level k maximizes the following difference 

E /(x^- E ("s-i)/(x s ). 

KGC ah ses ch 

Now we make some observation on the corresponding copula densities. 
For two variables it was shown (see [8] and [2?]) that: 



'(X,Y)= I c (u, v) log 2 c {u, v) , 



) dudv 
[0;i] 2 

which means that information content is equivalent with " copula entropy" concept 
introduced in [27] . 

Generalizing this for the variables involved in the sets K and 5* we have: 

J(Xif)= / c(ux A .)log 2 c(u XA .)du XR . = -H (c Xif ) 

[0;l] fc 

1 ( X s) = / c (u xs ) log 2 c (u xs ) du xs = ~H (c xs ) 
[0;l] k - 1 
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Using the above assertions in Theorem [8] the Kullback-Leibler divergence can be 
expressed by means of copula entropies: 

E H E (y 8 -i)HM . 

Kec ch ses ch 

Remark 6 The cherry-wine copula density cy cs associated to the best fitting cherry- 
wine probability density /y cs minimizes the following difference over all possible 
truncated R-vines at level k: 

£ h m- E ("s-i)ff(<* fl ) • 

K£C ch ses ch 



6 Conclusion 

In this paper we gave an alternative definition of Regular-vines using the con- 
cept of f-cherry junction tree. We introduced the cherry-wine structure (a trun- 
cated R-vine assigned to a i-cherry probability distribution). We gave an algorithm 
for constructing a truncated R-vine at level k starting from special k-th order t- 
cherry junction trees. The problem of inference was also discussed. We developed 
a method for obtaining a good factorization (which exploits conditional indepen- 
dences) starting from a sample data. In the last section we discussed some theo- 
retical properties of the best fitting truncated R-vine. In future we are planning 
to extend our algorithm to the general case. 
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